Model Selection

End-to-end speech model

# End-to-end speech model

Voila is a brand-new large-scale speech-language foundation model series designed to elevate human-computer interaction to unprecedented levels.

Transformers Supports Multiple Languages

Llama3.1 Typhoon2 Audio 8b Instruct

Typhoon 2-Audio Edition is an end-to-end speech-to-speech model architecture capable of processing audio, speech, and text inputs while simultaneously generating both text and speech outputs. The model is specifically optimized for Thai language while also supporting English.

Transformers Supports Multiple Languages

FlowMirror is an end-to-end speech model developed by Zhejiang Jingzhunxue AI Lab, supporting tasks such as voice dialogue, ASR, and TTS, with a focus on educational applications

Vietnamese text-to-speech model developed by Meta, based on the VITS architecture, supporting high-quality speech synthesis

Speech Synthesis

W2v Timit Ft 4001

A speech recognition model based on Wav2Vec 2.0 architecture, fine-tuned on the TIMIT dataset, suitable for English speech-to-text tasks

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase